Fast Follower Recovery for State Machine Replication

نویسندگان

  • Jinwei Guo
  • Jiahao Wang
  • Peng Cai
  • Weining Qian
  • Aoying Zhou
  • Xiaohang Zhu
چکیده

The method of state machine replication, adopting a single strong Leader, has been widely used in the modern cluster-based database systems. In practical applications, the recovery speed has a significant impact on the availability of the systems. However, in order to guarantee the data consistency, the existing Follower recovery protocols in Paxos replication (e.g., Raft) need multiple network trips or extra data transmission, which may increase the recovery time. In this paper, we propose the Follower Recovery using Special mark log entry (FRS) algorithm. FRS is more robust and resilient to Follower failure and it only needs one network round trip to fetch the least number of log entries. This approach is implemented in the open source database system OceanBase. We experimentally show that the system adopting FRS has a good performance in terms of recovery time.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Protocol-Aware Recovery for Consensus-Based Storage

We introduce protocol-aware recovery (PAR), a new approach that exploits protocol-specific knowledge to correctly recover from storage faults in distributed systems. We demonstrate the efficacy of PAR through the design and implementation of corruption-tolerant replication (CTRL), a PAR mechanism specific to replicated state machine (RSM) systems. We experimentally show that the CTRL versions o...

متن کامل

Fast Log Replication in Highly Available Data Store

Modern large-scale data stores widely adopt consensus protocols to achieve high availability and throughput. The recently proposed Raft algorithm has better understandability and widely implemented in large amount of open source projects. In these consensus algorithms including Raft, log replication is a common and frequently used operation which has significant impact on the system performance...

متن کامل

Virtually Synchronous Methodology for Dynamic Service Replication

In designing and building distributed systems, it is common engineering practice to separate steady-state (“normal”) operation from abnormal events such as recovery from failure. This way the normal case can be optimized extensively while recovery can be amortized. However, integrating the recovery procedure with the steady-state protocol is often far from obvious, and can present subtle diffic...

متن کامل

Tapping TCP Streams

Providing transparent replication of servers has been a major goal in the fault tolerance community. Transparent replication is particularly challenging for highly nondeterministic applications, such as the ones that use multithreading. For such applications, keeping replicas in a consistent state becomes non-trivial. One way to deal with the non-determinism is to use a leader/follower approach...

متن کامل

Checkpointing in Parallel State-Machine Replication

State-machine replication is a popular approach to building fault-tolerant systems, which relies on the sequential execution of commands to guarantee strong consistency. Sequential execution, however, threatens performance. Recently, several proposals have suggested parallelizing the execution model of the replicas to enhance state-machine replication’s performance. Despite their success in acc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017